SEQUENCE PROTOCOL 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Boehringer Mannheim GmbH 

(B) ROAD: Sandhofer Str. 112-132 

(C) CITY: Mannheim 

(E) COUNTRY: Germany 

(F) POSTAL CODE: 68305 

(ii) TITLE OF APPLICATION: Recombinant antigen from 

the NS3 region of the hepatitis C virus 

(iii) NUMBER OF SEQUENCES: 8 

(iv) COMPUTER READABLE FORM: 

(A) DATA CARRIER: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, 

Version #1.25 (EPA) 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 885 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND FORM: both 

(D) TOPOLOGY: linear 

(ii) TYPE OF MOLECULE: cDNA 



(Vi) INITIAL ORIGIN: 

(A) ORGANISM: hepatitis C virus 



(viii) POSITION IN THE GENOME: 

(A) CHROMOSOME/ SEGMENT: NS3 

(ix) CHARACTERISTICS: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..885 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



ATG ACC ATG ATT ACG AAT TCC CGG GGA TCC ATC ATG AAA TCC CCG GTG 
Met Thr Met lie Thr Asn Ser Arg Gly Ser lie Met Lys Ser Pro Val 
1 5 10 15 

TTC ACG GAT AAC TCC TCT CCA CCG GTA GTG CCC CAG AGC TTC CAG GTG 
Phe Thr Asp Asn Ser Ser Pro Pro Val Val Pro Gin Ser Phe Gin Val 
20 . 25 30 

GCT CAC CTG CAT GCT CCC ACA GGC AGC GGC AAG AGC ACC AAG GTC CCG 
Ala His Leu His Ala Pro Thr Gly Ser Gly Lys Ser Thr Lys Val Pro 
35 40 45 

GCT GCA TAC GCA GCT CAG GGC TAC AAG GTG CTA GTG CTC AAC CCT TCT 
Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu' Asn Pro Ser 
50 55 60 

GTT GCT GCA ACA TTG GGC TTT GGT GCC TAC ATG TCC AAG GCT CAT GGG 
Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His Gly 
6 5 70 75 80 

ATC GAT CCT AAC ATC AGG ACC GGG GTG AGA ACA ATT ACC ACT GGC AGC 

lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr He Thr Thr Gly Ser 

85 90 95 

CCC ATT ACG TAC TCC ACT TAC GGC AAG TTT CTT GCC GAC GGC GGG TGC 
Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly Cvs 
100 105 110 

GCA GGG GGT GCT TAT GAC ATA ATA ATT TGT GAC GAG TGC CAC TCC ACG 
Ala Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser Thr 
115 120 125 

GAT GCC ACA TCC ATC ' TTG GGC ATC GGC ACT GTC CTT GAC CAA GGA GAG 
Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Gly Glu 
130 135 140 



■ r 

ACT GCG GGG GCG AAA TTC* GIT GTG TTC GCC ACC GCC AC^ COT CCG GGC 4 80 

Thr Ala Gly Ala Lys Leu Val Val Phe Ala Thr Ala Thr Pro Pro Gly 
145 150 155 160 

TCC GTC ACT GTG CCC CAT CCC AAC ATT GAG GAG GTT GCT CTA TCC ACC 528 
Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala Leu Ser Thr 

165 170 175 

ACC GGA GAG ATC CCT TTT TAC GGC AAG GCT ATC CCC CTT GAG GTA ATC 57 6 

Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala lie Pro Leu Glu Val He 
180 185 190 

AAG GGG GGG AG A CAT CTC ATC TTC TGT.CAT TCA AAG AGG AAG TGC GAT 624 
Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Arg Lys Cys Asp 
195 200 205 

GAG CTC GCC ACA AAG CTG GTC GCA ATG GGC ATC AAT GCC GTG GCC TAC 672 
Glu Leu Ala Thr Lys Leu Val Ala Met Gly He Asn Ala Val Ala Tyr 
210 215 220 

TAC CGC GGT CTT GAC GTG TCC GTC ATC CCG ACC AGC GGT GAT GTT GTC 720 
Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val Val 
225 230 235 * 240 

GTC GTG GCA ACC GAC GCC CTC ATG ACC GGC TAT ACC GGC GAC TTC GAC 768 
Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe Asp 

245 250 " 255 

TCG GTG ATA GAC TGC AAC ACG TGT GTC ACT CAG ACA GTC GAT TTC AGC 816 
Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe Ser 
260 265 270 

CTT GAC CCT ACC TTC ACC ATT GAG ACG ACC ACA CTT CCC CAG GAT GCT 864 
Leu Asp Pro Thr Phe Thr He Glu Thr Thr Thr Leu Pro Gin Asp Ala 
275 280 285 

GTC TCC CGC ACT CAA CGA CGG 885 
Val Ser Arg Thr Gin Arg Arg 
290 295 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 295 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) TYPE OF MOLECULE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
Met Thr Met He Thr Asn Ser Arg Gly Ser He Met Lys Ser Pro 



15 10 

Phe Thr Asp Asn Ser Ser Pro Pro Val Val Pro 
20 25 

Ala His Leu His Ala Pro Thr Gly Ser Gly Lys 
35 40 

Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu 
50 55 

Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met 
65 70 75 

lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr 

85 90 

Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu 
100 105 

Ala Gly Gly Ala Tyr Asp lie lie lie Cys Asp 
115 120 

Asp Ala Thr Ser He Leu Gly He Gly Thr Val 
130 135 

Thr Ala Gly Ala Lys Leu Val Val Phe Ala Thr 
145 . 150 155 

Ser Val Thr Val Pro His Pro Asn He Glu Glu 

165 170 

Thr Gly Glu He Pro Phe Tyr Gly Lys Ala lie 
180 * 185 

Lys Gly Gly Arg His Leu He Phe Cys His Ser 
195 200 

Glu Leu Ala Thr Lys Leu Val Ala Met Gly He 
210 .215 

Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr 
225 230 235 

Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr 

245 250 

Ser Val He Asp Cys Asn Thr Cys Val Thr Gin 
260 265 

Leu Asp Pro Thr Phe Thr He Glu Thr Thr Thr 
275 280 

Val Ser Arg Thr 'Gin Arg Arg 
290 295 
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Gin Ser Phe Gin. Val 
30 

Ser Thr Lys Val Pro 
45 

Val Leu Asn Pro Ser 
60 

Ser Lys Ala His Gly 

80 

He Thr Thr Gly Ser 
95 

Ala Asp Gly Gly Cys 
110 

Glu Cys His Ser Thr 
125 

Leu Asp Gin Gly Glu 
14 0 

Ala Thr Pro Pro Gly 

160 

Val Ala Leu Ser Thr 
175 

Pro Leu Glu Val He 
190 

Lys Arg Lys Cys Asp 
205 

Asn Ala Val Ala Tyr 
220 

Ser Gly Asp Val Val 

240 

Thr Gly Asp Phe Asp 
255 

Thr Val Asp Phe Ser 
270 

Leu Pro Gin Asp Ala 
285 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND FORM: single 

(D) TOPOLOGY: linear 

(ii) TYPE OF MOLECULE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 

AAGGGATCCA TCATGAAATC CCCGGTGTTC ACGGATAACT 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND FORM: single 

(D) TOPOLOGY: linear 

(ii) TYPE OF MOLECULE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 

GGGAAGCCTT AATTCTTACC GTCGTTGAGT GCGGGAGAC 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND FORM: single 



(D) TOPOLOGY: linear 
(ii) TYPE OF MOLECULE: cDNA 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5 

GAGGGATCCA TCATGAAAGC GGTGGACTTT ATCCCTGTG 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND FORM: single 

(D) TOPOLOGY: linear 

(ii) TYPE OF MOLECULE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6 

GAGAAGCTTT TAACACGTGT TGCAGTCTAT CAC 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND FORM: single 

(D) TOPOLOGY: linear 

(ii) TYPE OF MOLECULE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 
GAGGGATCCA TCATGAAACA CCTGCATGCT CCCACCGGC 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND FORM: single 

(D) TOPOLOGY: linear 

(ii) TYPE OF MOLECULE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8 



GAGAAGCTTT TAATACCAAG CACAGCCTGC GTC 



